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Abstract. We study finite horizon optimal switching problems for hidden Markov chain models 
with point process observations. The controller possesses a finite range of strategies and attempts 
to track the state of the unobserved state variable using Bayesian updates over the discrete obser- 
vations. Such a model has applications in economic policy making, staffing under variable demand 
levels and generalized Poisson disorder problems. We show regularity of the value function and 
explicitly characterize an optimal strategy. We also provide an efficient numerical scheme and 
illustrate our results with several computational examples. 



An economic agent (henceforth the controller) observes a compound Poisson process X with 
arrival rate A, and mark/jump distribution v. The local characteristics (A, v) of X are determined 
by the current state of an unobservable Markov jump process M with finite state space E = 
{!,..., m}. More precisely, the characteristics are (Aj, z/j) whenever M is at state i, for i E E. 

The objective of the controller is to track the state of M given the information in X. To do so, 
the controller possesses a range of policies a in the finite alphabet A = {!,..., A}. The policies 
are sequentially adopted starting from time and until some fixed horizon T < oo. The infinite 
horizon case T = +00 is treated in Section 5.1. The selected policy a leads to running costs 
(benefits) at instantaneous rate 



The controller's overall strategy consists of a double sequence (r^., ,^fc), = 0, 1, 2, . . ., with & A 
representing the sequence of chosen policies and = tq < ti < ■ ■ ■ < T representing the times 
of policy changes (from now on termed switching times). We denote the entire strategy by the 
right-continuous piecewise constant process ^ : [0, T] x Q ^ A, with = Cfc if Tfc ^ ^ < Tfc+i or 
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Beyond running benefits, the controller also faces switching costs in changing her policy which 
lead to inertia and hysteresis. If at time t, the controller changes her policy from a to 6 and Mt = i 
then an immediate cost Ki{a, b) is incurred. The overall objective of the controller is to maximize 
the total present value of all tracking benefits minus the switching costs which is given by 

Te-"* ( ^Q(et)l{M.=.} ) dt -Y^e-^'^ [Y^mr^-.U) ■ li^u,=^} 

\ie£ / k \i£E 

where p > is the discount factor. 

Since M is unobserved, the controller must carry out a filtering procedure. We postulate 
that she collects information about M via a Bayesian framework. Let n = (tti, . . . , Tr^) = 
(P{Mo = 1}, . . . ,P{Mo = m}) be the initial (prior) beliefs of the controller about M and P'^ the 
corresponding conditional probability law. The controller starts with beliefs vr, observes X, up- 
dates her beliefs and adjusts her policy accordingly. Because only X is observable, the strategy 
^ should be determined by the information generated by X, namely each r/- must be a stopping 
time of the filtration J-'^ of X. Similarly, the value of each is determined by the information 
jrx Yeyealed by X until r^. These notions and the precise updating mechanism will be formalized 
in Section 2.3. We denote by W(T) the set of all such admissible strategies on a time interval 
[0,T]. Since strategies with infinitely many switches would have infinite costs, we exclude them 
from U{T). 

Starting with initial policy a E A and beliefs n, the performance of a given policy ^ eU (T) is 
(1.2) 



J«(T,7f,a) =E" 



/ e"'* (5^Q(6)i{M.=.})rft-$^e-''-'=(5^ir,(efc-i,efc)-i{M.,=.} 



The first argument in is the remaining time to maturity. The optimization problem is to 
compute 

(1.3) U{T,TT,a) = sup J^(T, vf, a), 

and, if it exists, find an admissible strategy ^* attaining this value. In this paper we solve (1.3), 
including giving a full characterization of an optimal control C,* and a deterministic numerical 
method for computing U to arbitrary level of precision. The solution will proceed in two steps: 
an initial filtering step and a second optimization step. The inference step is studied in Section 2, 
where we convert the optimal control problem with partial information (1.3) into an equivalent fully 
observed problem in terms of the a posteriori probability process 11. The process 11 summarizes 
the dynamic updating of controller's beliefs about the Markov chain M given her point process 
observations. The explicit dynamics of 11 are derived in Proposition 2.2, so that the filtering step 
is completely solved. The main part of the paper then analyzes the resulting optimal switching 
problem (2.6) in Sections 3 and 4. 
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To our knowledge, the finite horizon partially observed switching control problem (which might 
be viewed as an impulse control problem in terms of ^) defined in (1.3), has not been studied 
before. However, it is closely related to optimal stopping problems with partially observable Cox 
processes that have been extensively looked at starting with the Poisson Disorder problems, see 
e.g. Peskir and Shiryaev [2000, 2002], Bayraktar and Dayanik [2006], Bayraktar ct al. [2006], 
Bayraktar and Sezer [2006] . In particular, Bayraktar and Sezer [2006] solved the Poisson disorder 
problem when the change time has phase type prior distribution by showing that it is equivalent 
to an optimal stopping problem for a hidden Markov process (which has several transient states 
and one absorbing state) that is indirectly observed through a point process. Later Ludkovski 
and Sezer [2007] solved a similar optimal stopping problem in which all the states of the hidden 
Markov chain are recurrent. Both of these works can be viewed as a special case of (1.3), see 
Remark 3.2. Our model can also be viewed as the continuous-time counterpart of discrete-time 
sequential M-ary detection in hidden Markov models, a topic extensively studied in sequential 
analysis, see e.g. Tartakovsky et al. [2006], Aggoun [2003]. 

Filtering problems with point process observations is a well-studied area; let us mention the 
work of Arjas et al. [1992], Ccci and Gerardi [1998] and the reference volume Elliott et al. [1995]. 
In our model we use the previous results obtained in Bayraktar and Sezer [2006], Ludkovski and 
Sezer [2007] to derive an explicit filter; this allows us then to focus on the separated fully-observed 
optimal switching problem using the new hyper-state. Let us also mention the recent paper of 
Chopin and Varini [2007] who study a simulation-based method for filtering in a related model, 
but where an explicit filter is unavailable and must be numerically approximated. 

The techniques that we use to solve the optimal switching/ impulse control problem are different 
from the ones used in the continuous-time optimal control problems mentioned above. The main 
tool in solving the optimal stopping problems (in the multi-dimensional case, the tools in the one 
dimensional case are not restricted to the one described here) is the approximating sequence that is 
constructed by restricting the time horizon to be less than the time of the n-th observation/jump 
of the observed point process. This sequence converges to the value function uniformly and 
exponentially fast. However, in the impulse control problem, the corresponding approximating 
sequence is constructed by restricting the sum of the number of jumps and interventions to be 
less than n. This sequence converges to the value function, however the uniform convergence in 
both T and vf is not identifiable using the same techniques. 

As in Costa and Davis [1989] and Costa and Raymundo [2000] (also see Mazziotto et al. [1988] 
for general theory of impulse control of partially observed stochastic systems), we first character- 
ize the value function U as the smallest fixed point of two functional operators and obtain the 
aforementioned approximating sequence. Using one of these characterization results and the path 
properties of the a posteriori probability process we obtain one of our main contributions: the reg- 
ularity of the value function U. We show that U is convex in yf, Lipschitz in the same variable on 
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the closure of its domain, and Lipschitz in the T variable uniformly in vf. Our regularity analysis 
leads to the proof of the continuity of U in both T and vf which in turn lets us explicitly describe 
an optimal strategy. 

The other characterization of as a fixed point of the first jump operator is used to numerically 
implement the optimal solution and find the value function. In general, very little is known about 
numerics for continuous-time control of general hidden Markov models, and this implementation 
is another one of our contributions. We combine the explicit filtering equations together with 
special properties of piecewise deterministic processes [Davis, 1993] and the structure of general 
optimal switching problems to give a complete computational scheme. Our method relies only on 
deterministic optimization sub-problems and lets us avoid having to deal with first order quasi- 
variational inequalities with integral terms that appear in related stochastic control formulations 
(see remark 3.3 below). We illustrate our approach with several examples on a finite/infinite 
horizon and a hidden Markov chain with two or three states. 

Our framework has wide-ranging applications in operations research, management science and 
applied probability. Specific cases are discussed in the next subsection. As these examples demon- 
strate, our approach leads to sensible policy advice in many scenarios. Most of the relevant applied 
literature treats discrete-time stationary problems, and our model can be seen as a finite-horizon, 
continuous-time generalization of these approaches. 

The rest of the paper is organized as follows: In Section 1.1 we propose some applications of our 
modeling framework. In Section 2 we describe an equivalent fully observed problem in terms of 
the a posteriori probability process 11. We also analyze the dynamics of 11. In Section 3 we show 
that U satisfies two different dynamic programming equations. The results of Section 3 along with 
the path description of n allows us to study the regularity properties of U and describe an optimal 
strategy in Section 4. Our model can be extended beyond (1.3), in particular to cover the case of 
infinite horizon and the case in which the costs are incurred at arrival times. The extensions are 
described in Section 5. Extensive numerical analysis of several illustrative examples is carried out 
in Section 6. 

1.1. Applications. In this section we discuss case studies of our model and the relevant applied 
literature. 

1.1.1. Cyclical Economic Policy Making. The economic business cycle is a basis of many policy 
making decisions. For instance, the country's central bank attempts to match its monetary policy, 
so as to have low interest rates in periods of economic recession and high interest rates when the 
economy overheats. Similarly, individual firms will time their expenditures to coincide with boom 
times and will cut back on capital spending in unfavorable economy states. Finally, investors 
hope to invest in the bull market and stay on the sidelines during the bear market. In all these 
cases, the precise current economy state is never known. Instead, the agents collect information via 
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economic events, surveys and news, and act based on their dynamic beliefs about the environment. 
Typically, such news consist of discrete events (e.g. earnings pre-announcements, geo-political 
news, economic polls) which cause instantaneous jumps in agents' beliefs. Thus, it is natural to 
model the respective information structure by observations of a modulated compound Poisson 
process. Accordingly, let M represent the current state of the economy and let the observation X 
correspond to economic news. Inability to correctly identify M will lead to (opportunity) costs 
CUsi^s)- Hence, one may take A = E and Ca(a) = 0,Ca(6) < 0. The strategy ^ represents the 
set of possible actions of the agent. The switching costs of the form K{^s,(.s-) > correspond to 
the costly influence of the Federal Reserve changing its interest rate policy, or to the transaction 
costs incurred by the investor who gets in/out of the market. Depending on the particular setting, 
one may study this problem both in finite- and infinite-horizon setting, and with or without 
discounting. For instance, a firm planning its capital budgeting expenses might have a fixed 
horizon of one year, while a central bank has infinite horizon but discounts future costs. A 
corresponding numerical example is presented in Section 6.2. 

1.1.2. Matching Regime- Switching Demand Levels. Many customer-oriented businesses experience 
stochastically fluctuating demand. Thus, internet servers face heavy/light traffic; manufacturing 
managers observe cyclical demand levels; customer service centers have varying frequencies of 
calls. Such systems can be modeled in terms of a compound Poisson request process X modulated 
by the partially known system state M. Here, X serves the dual role of representing the actual 
demands and conveying information about M. The objective of the agent is to dynamically choose 
her strategy ^, so as to track current demand level. For instance, an internet server receives 
asynchronous requests Yi, £ = 1,2,... (corresponding to jumps of X) that take c(Y^,^f) time 
units to fulfill. The rate of requests and their complexity distribution depend on M. In turn, the 
server manager can control how much processing power is devoted to the server: more processors 
cut down individual service times but lead to higher fixed overhead. Such a model effectively 
corresponds to a controlled M(A) /G/oo-queue, where the arrival rate A is M-modulated, and where 
the distribution of service times depends both on M and the control C,- A related computational 
example concerning a customer call center is treated in Section 6.3. 

A concrete example that has been recently studied in the literature is the insurance premium 
problem. Insurance companies handle claims in exchange for policy premiums. A standard model 
asserts that claims Yi, 1^2, • • • form a compound (time-inhomogeneous) Poisson process X. Suppose 
that the rate of claims is driven by some state variable M that measures the current background 
risk (e.g. climate, health epidemics, etc.), with the latter being unobserved directly. In Aggoun 
[2003], such a model was studied (in discrete time) from the inference point of view, deriving 
the optimal filter for the insurance environment M given the claim process. Assume now that 
the company can control its continuous premium rate c^(^t), as well as its deductible level c^(^t). 
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High deductibles require lowering the premium rate, and are therefore only optimal in high-risk 
environments. Furthermore, changes to policy provisions (which has a finite expiration date T) 
are costly and should be undertaken infrequently. The overall objective is thus. 



sup E^''^ 

«6W(T) 



^e-'''^^(F,-ci(e.,))++ / c2(6)rft-5^e-''-'= 5^ir,(a-i,a)-l{M.,=.} 
i=i k \ieE J 



where N is the counting process for the number of claims. The resulting cost structure, which is 
a variant of (1.3), is described in Section 5.2. 



1.1.3. Security Monitoring. Classical models of security surveillance (radar, video cameras, com- 
munication network monitor) involve an unobserved system state M representing current security 
(e.g. E = {0, 1}, where corresponds to a 'normal' state and 1 represents a security breach) and 
a signal X. The signal X records discrete events, namely artifacts in the surveyed space (radar 
alarms, camera movement, etc.). Benign artifacts are possible, but the intensity A of X increases 
when Mf = 1. If the signal can be decomposed into further sub-types, then X becomes a marked 
point process with marks (Y^). The goal of the monitor is to correctly identify and respond to 
security breaches, while minimizing false alarms and untreated security violations. Classical for- 
mulations [Tartakovsky et al., 2006, Peskir and Shiryaev, 2000] only analyze optimality of the first 
detection. However, in most practical problems the detection is ongoing and discrete announce- 
ment costs require studying the entire (infinite) sequence of detection decisions. Accordingly, our 
optimal switching framework of (1.3) is more appropriate. 

As a simplest case, the monitor can either declare the system to be sound = 1, or declare a 
state of alarm = 2. This produces M-dependent penalty costs at rate J2jeE Cj{^t)^{Mt=j}dt] also 
changing the monitor state is costly and leads to costs K. A typical security system is run on an 
infinite loop and one wishes to minimize total discounted costs, where the discounting parameter 
p models the effective time-horizon of the controller (i.e. the trade-off between the myopically 
optimal announcement and long-run costs). Such an example is presented in Section 6.1. 

1.1.4. Sequential Poisson Disorder Problems. Our model can also serve as a generalization of 
Poisson disorder problems, [Bayraktar et al., 2006, Peskir and Shiryaev, 2002]. Consider a simple 
Poisson process X whose intensity A sequentially alternates between Aq and Ai. The goal of the 
observer is to correctly identify the current intensity; doing so produces a running reward at 
rate co(^f) per unit time, otherwise a cost at rate ci(^t) is assessed, where ^ is the control process. 
Whenever the observer changes her announcement, a fixed cost K is charged in order to make sure 
that the agent does not vacillate. Letting M, Mt G {0, 1} denote the intensity state, and A = 
this example yet again fits into the framework of (1-3). Obvious generalizations to multiple values 
of A and multiple announcement options for the observer can be considered. Again, one may study 
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the classical infinite-horizon problem, or the harder time-inhomogeneous model on finite-horizon, 
where the observer must also take into account time-decay costs. 

2. Problem Statement 

In this section we rigorously define the problem statement and show that it is equivalent to a 
fully observed impulse control problem using the conditional probability process 11. We then derive 
explicitly the dynamics of H. First, however we give a construction of the probability measure P 
and the formal description of X. 

2.1. Observation Process. Let (Jl,7Y,Po) be a probability space hosting two independent ele- 
ments: (i) a continuous time Markov process M taking values in a finite set and with infini- 
tesimal generator Q — {qij)ij^E, (h) a compound Poisson process X with intensity Ai and jump 
size distribution ui on R*^. Let F = {J^f} be the natural filtration of X enlarged by Po-nuU sets, 
and consider its initial enlargement G = {Qt}t>o with Qt = a{J-'f , a{{Mt}t>o)) for all t > 0. The 
filtration G summarizes the information fiow of a genie that observes the entire path of M at time 
t = 0. 

Denote by ctq, cri, . . . the arrival times of the process X, 

oi = mi{t > (7^_i : Xt Xt-}, i > 1 with ao = 0. 

and by Yi, Y2, . . . the M'^-valucd marks observed at these arrival times: 

Ye = X,^ - i > 1. 

Then in terms of the counting random measure 

00 

(2-1) p{{0,t],A)^^l^,^<ty^Y,eA}, 

i=l 

where ^4 is a Borel set in M*^, we can write the observation process X as 



Xt = Xo+ / yp{ds,dy). 

J(0.t\xR'^ 



'(0,t]xR<* 

Let us introduce the positive constants X2, ■ ■ ■ ,Xm and the distributions 1/2, . . . ,iym- We also 
define the total measure 1/ = ui + . . . + i/^, and let fi{-) be the density of Ui with respect to u. 
Define 

R(t, y) = E hM,=i}\ifi{y). t > 0, y e M<^. 

ijiyy) .^^ 

and denote the (Po,G)— (or (Pq, F))-compensator of p by 

(2.2) po((0,t] xA) = Alt / MyHdy), t>0,Ae B{R''). 

J A 
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We will use R(t, y) and po to change the underlying probability measure to a new probability 
measure P on (fi, T-i) defined by 

— =Z 



[R{s,y) - l]po{ds,dy) \ , 



where the stochastic exponential Z given by 

Zt = expl / log(i?(s, y))p{ds, dy) - 

Lj(0,t]xIR'' J(0,t]xKd 

is a (Po, G)-martingale. Note that P and Pq coincide on since Zq = 1, therefore law of the 
Markov chain M is the same under both probability measures. Moreover, the (P, G)-compensator 
of p becomes 



(2.3) 



Pi((0,t],A) = V [ l{Ms=^}K I h{y)v{dy)ds. 



see e.g. Jacod and Shiryaev [1987]. The last statement is equivalent to saying that under this new 
probability, X has the form 

(2.4) 



"^0 iGE 



m which XW,...,X(™) are independent compound Poisson processes with intensities and jump 
size distributions (Ai, z/i), . . . , (A^,, i^m), respectively. Such a process X is called a Markov-modulated 
Poisson process [Karlin and Taylor, 1981]. By construction, the observation process X has indepen- 
dent increments conditioned on M = {Mt}t>o- Thus, conditioned on {Mo-^ = i}, the distribution 
of is z/i(-) on (M'^,i3(M'^)). 

2.2. Equivalent Fully Observed Problem. Let D = {n E [0, 1]"" : tti + . . . + vr^ = 1} be the 
space of prior distributions of the Markov process M. Also, let S{s) = {r: ¥ — stopping time, r < 
s, P — a.s} denote the set of all F-stopping times smaller than or equal to s. 

We define the D-valued conditional probability process U{t) = {Ui{t),..., n„(t)) such that 



(2.5) 



Ili{t) = F{Mt = i\Tf}, for ie E, andt> 0. 



Each component of 11 gives the conditional probability that the current state of M is {i} given the 
information generated by X until the current time t. Using the process 11 we now convert (1.3) 
into a standard optimal stopping problem. 

Proposition 2.1. The performance of a given strategy ^ G W(T) can be written as 



(2.6) 



J«(T,7r,, 



JO u 
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in terms of the functions 
(2.7) 



C(7f,a) = ^Ci(a)7ri, 



and K{a, b, vf) = Ki{a, h)TXi. 



Proposition 2.1 above states that solving the problem in (1.3) is equivalent to solving an impulse 
control problem with state variables 11 and ^. As a result, the filtering and optimization steps are 
completely separated. In our context with optimal switching control, the proof of this separation 
principle is immediate (see e.g. Shiryaev [1978, pp. 166-167]). In more general problems with 
continuous controls, the result is more delicate, see Ceci and Gerardi [1998]. 

We proceed to discuss the technical assumptions on C and K. Note that by construction C{-,a) 
and K{a, b, ■) are linear. Moreover, C is bounded since E is finite, so there is a constant denoted 
c = ma.Xi(zE \ci\ that uniformly bounds possible rates of profit, |C(7r, a)| < c. For the switching 
costs K we assume that they satisfy the triangle inequality 

Ki{a, h) + Kiip, c) > Ki{a, c), and Ki{a, b) > kQ > for i E E; a,b,c E A. 

By the above assumptions on the switching costs and because possible rewards are uniformly 
bounded, with probability one the controller only makes finitely many switches and she does not 
make two switches at once. Without loss of generality we will also assume that every element in 
^ eU (T) satisfies 



(2.8) 



< oo. 



Otherwise, the cost associated with a strategy ^ would be — oo since 



e^^*|C(n(t),6)Mt 



< cT, 



and taking no action would be better than applying ^. 

In the sequel we will also make use of the following auxiliary problems. First, let Uq be the 
value of no-action, i.e., 

e^P^C{lit,a)dt . 



(2.9) t/o(T,7r,a) = E"'*^ 

Also in reference to (1.3), we will consider the restricted problems 

(2.10) f/„(T, vf, a)= sup J^(T, TT, a), n>l, 

li(iUr,(T) 

in which Un{T) is a subset of U{T) which contains strategies with at most n > 1 interventions up 
to time T. 
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2.3. Sample paths of 11. In this section we describe the fihering procedure of the controller, i.e. 
the evolution of the conditional probability process 11. Proposition 2.2 explicitly shows that the 
processes 11 and (11, ^) are piecewise deterministic processes and hence have the strong Markov 
property, Davis [1993]. This description of paths of the conditional probability process is also 
discussed in Proposition 2.1 in Ludkovski and Sczcr [2007] and Proposition 2.1 of Bayraktar and 
Sezcr [2006]. We summarize the needed results below. 
Let 

t-t rn 

(2.11) I{t)^ / y^\l{M.=,}ds, 

so that the probability of no events for the next u time units is P'^jai > u} = E'^[e~-'^^")]. Then 
ioT ai < t < t + u < (T^+i, we have 

P^{(Ji >u,Mu = i} 



(2.12) Iii{t + u) 



P^{cTi > u] 



7r=n(t) 

On the other hand, upon an arrival of size 1^, the conditional probability 11 experiences a jump 

/OiQ\ -n ( \ -^j/i(^^+l)ni((T^+l— ) « ^ T^T 

(2-13) ^Mi+i) = ^ . , , y for £ e N. 

To simplify (2.12), define x{t, n) = (xi(t, tt), . . . , Xm(t, vf)) via 

^ ' x,it,7rj p^{^^>t| E-[e-^W] ' ^ ^ ^- 

It can be checked easily that the paths t ^ x{t, if) have the semigroup property x{t + m, vf) = 
x{u,x(t,7i)). In fact, x can be described as a solution of coupled first-order ordinary differential 
equations. To observe this fact first recall [Darroch and Morris, 1968, Neuts, 1989, Karlin and 
Taylor, 1981] that the vector 

(2.15) m(t, vf) = (mi(t, vf), . . . , m^(t, vf)) ^ ( E^'"" [1{m,=i} • e''^*)] , . . . , E^'" [l{M,=m} ■ 6"^^] ) 
has the form 

m(t,vf) = vf ■e*^^-^), 

where A is the m x m diagonal matrix with Aj^j = Aj. Thus, the components of m(t, vf) solve 
dmi(t,7f)/dt = —Ximi{t,7i) + J2jeE''^ji^y''^) ' together with the chain rule and (2.14) we 

obtain 

(2.16) = I ^ qj,iXj{t, vf) - XiXi{t, vf) + Xi{t, vf) ^ XjXj{t, vf) 

\jeE jeE 

For the sequel we note again that P'^ {ai E du, Mu = i} = E'^'"- [Ajl{M„=j}e"'^^"-'] du = Aj mi{u, vf) du. 
The preceding equations (2.12) and (2.13) imply that 
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Proposition 2.2. The process 11 is a piecewise-deterministic, {F,¥)-Markov process. The paths 
have the characterization 



(2.17) 



n(t) =x{t-ae, n((T,) 

Ai/i(r,)ni(a,-) 



n(a,) 



Xmfm(Xe)^m{o-£ — ) 



E,eE A,/,(F,)n,(a,-) ' • • • Z,eE A,/,(y,)n,(ar 
Alternatively, we can describe 11 in terms of the random measure p, 

dU,{t) = fii{li{t-)) dt + [ J,{li{t-),y)p{dt,dy), 

for alH G -E, where 

(2.18) Hi{if) = ^qj,i'JTj + Xn, i^XjTij ~ XA , and Ji{7c,y) = Hi ■ i — , 

jeE \j&E J \l^jeE^jJj[y)^j 

Here, one should also note that the (P, F)-compensator of the random measure p is 
p{{0,t] X A) = [ [ ^jfj{y)'^j{s)dyds, t>0,A Borel. 

— > 

In more general models with point process observations, an explicit filter for H would not 
be available and one would have to resort to simulation-based approaches, see e.g. Chopin and 
Varini [2007]. The subsequent optimization step would then appear to be intractable, though an 
integrated Markov chain Monte Carlo paradigm for filtering and optimization was proposed in 
Muller et al. [2004]. 



3. Two Dynamic Programming Equations for the Value Function 

In this section we establish two dynamic programming equations for the value function U. The 
first key equation (3.13) reduces the solution of the problem (1.3) to studying a system of coupled 
optimal stopping problems. The second dynamic programming principle of Proposition 3.4 shows 
that the value function is also the fixed point of a first jump operator. The latter representation 
will be useful in the numerical computations. 

3.1. Coupled Optimal Stopping Operator. In this section we show that U solves a coupled 
optimal stopping problem. Combined with regularity results in Section 4, this leads to a direct 
characterization of an optimal strategy. The analysis of this section parallels the general framework 
of impulse control of piecewise deterministic processes (pdp) developed by Costa and Davis [1989], 
Lenhart and Liao [1988]. It is also related to optimal stopping of pdp's studied in Gugerli [1986], 
Costa and Davis [1988]. 
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Let US introduce a functional operator A4 whose action on a test function w is 
(3.1) Aiw{T,TT,a) = max < w{T, n, b) — K{a,b, Tr)\. 

The operator Ai is called the intervention operator and denotes the maximum value that can be 
achieved if an immediate best change is carried to the current policy. Assuming some ordering on 
the finite policy set A, let us denote the smallest policy choice achieving the maximum in (3.1) as 



(3.2) 



dMwiT, TT, a) = min< w(T, vf, b) — K(a, b, n) = M.w(T, vr, a) 

beA I 



The main object of study in this section is another functional operator Q whose action is 
described by the following optimal stopping problem: 



(3.3) gV{T, vf, a) = sup E"^'" 

Te-s(T) 



e-f'CiUs, a) ds + e-f^MViT - r, H,, a) 



for T G IR+, if E D, and a E A. We set Vq = Uq from (2.9) and iterating Q obtain the following 
sequence of functions: 

(3.4) K+i = QVr,, n>0. 

Lemma 3.1. (Ki)neN is an increasing sequence of functions. 
In Section 4 we will further show that (V„) are convex and continuous. 
Proof. The statement follows since 

e-^'C(n„ a) ds + e-P^MVoiT - r, n,, a) 



Vi{T, 7f, a) = gVoiT, TT, a) = sup E"'*^ 

Te<s(T) 



e"P'C(Ils,a) ds 



f/o(r,7f,a) = Vo{T,n,a) 



and since ^ is a monotone/positive operator, i.e. for any two functions fi < f2 we have Qfi < Qf2, 
and □ 

The following proposition shows that the value functions (t/n)neN of (2.10), which correspond 
to the restricted control problems over Z//„(T), can be alternatively obtained via the sequence of 
iterated optimal stopping problems in (3.4). 

Proposition 3.1. f/„ = for n eN. 

Proof. By definition we have that Uq = Vq. Let us assume that Un = Vn and show that Un+i = 
Ki+i- We will carry out the proof in two steps. 

Step 1. First we will show that f/„+i < Vn+i- Let ^ E W„+i(T), 

n+l 
k=0 
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(3.5) 



Un+i {T,n,a) - e < (T, tt , a) . 



Let ^ G lAniT) be defined as 



6 = 5Z4-i[f..f.+oW> te[o,r]. 



fc=0 



in which tq = 0, = o, and = r„+i , = ^n+i, for n G N+. Using the strong Markov property 
of (n,^), we can write as 



J«(T^ 



(3.6) 



, TT, a) = 



e-''^C(n„ a) rfs + e-P^^ (j^{T - n, IT,, , ^i) - Kia, 6, 



e"^^C(n„ a) + e-^^i K(T - ri, U,, , ^i) - if (a, ^i, 



< E" 



e-^"C(n„ a) ds + e^P^'MV„{T - n, n,,, 6) 



< ^K(T, vf, a) = K+i(T, vr, a). 

Here, the first inequahty follows from induction hypothesis, the second inequality follows from the 
definition of A^, and the last inequality from the definition of Q. As a result of (3.5) and (3.6) we 
have that f/„+i < Ki+i since e > is arbitrary. 

Step 2. To show the opposite inequality ?7n+i > Ki+i, we will construct a special ^ G W„+i(T). 
To this end let us introduce 



(3.7) 



ri = inf{t > : MVn{T - t, 11*, a) > K+i(T - t, Ut, a) - e}, 
^1 = dMV„ (T - r 1 , , a) . 



Let C,t = X]fe=o^'= ■ '^[rk,Tk+i)i^)^ i ^ Un{T) be ^-optimal for the problem in which n interventions 
are allowed, i.e. (2.10). Using ^ we now complete the description of the control ^ G Un+i{T) by 
assigning. 



(3.8) 



Tn+l = Tn O 6r^ , ^„_^i = ^„ O 6'^^ , 72 G Nh 



in which 6 is the classical s/iz/i operator used in the theory of Markov processes. 

Note that Ti is an e-optimal stopping time for the stopping problem in the definition of QVn- 
This follows from the classical optimal stopping theory since the process n has the strong Markov 
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property. Therefore, 
(3.9) 

K+i(T, TT, a) - £ < E'^'" 



in which the second inequahty follows from the definition of C,i and the induction hypothesis. It 
follows from (3.9) and the strong Markov property of (H, ^) that 



e-^^C(n„ a) ds + e-P^'MVn{T - n, U^„a) 



e-P'C{n,, a) ds + e-"^^ f/„,(r - T^, U^,,^,) - K (a,^„ U^, 



(3.10) 

K+i(T, 7f, a) - 2£ < E^'" 



< E 



7v,a 



^ ' e-^^C(II„ a) ds + e-P^^ (Un{T - n,U^,,^,) ~ e ~ K (a, ^i, 11^, 



e-^^C(li„ a) ds + e-P^' (j^{T - IT:,,, 6) - K (a, ^i, 11^, 



= J^{T,7c,a) < f/„+i(r,7f,a). 
This completes the proof of the second step since e > is arbitrary. 

Proposition 3.2. lim^-foo V„(T, vr, a) = f/(T, vf, a), for any T G M+, n E D , a E A. 



□ 



Proof. Fix {T,TT,a). The monotone limit V{T,7i,a) = lim„_^oo ^(^; ^f, a) exists as a result of 
Lemma 3.1. Since lAniT) C IA{T), it follows that VniT, tt, a) = Un(T, Tf,a)<U (T, tt, a). Therefore 
V{T, TT, a) < U{T, if, a). In the remainder of the proof we will show that V{T, tt, a) > U{T, vf, a). 

Let ^ G U(T) be given, and let '■= 6Ar„, ^ ^ i^n(T), correspond to ^ up to its n-th switch. 
Then 

(3.11) |J«(T,7f,a) - J^(T, TT, a) I 



< E 



7v,a 



k>n+l 



Now, the right-hand-side of (3.11) converges to as n — > oo: on the one hand observe that by 
monotone convergence theorem and (2.8) 



lim E"-'^ 



J2 e-^^'=ir(e.,_„e.„nrj 



.k>n+l 



0. 



On the other hand, since there are only finitely many switches almost surely for any given path. 



lim / l{,^^^ye-P^\CiU„Q-CiU,,^^J\ds = 0, 
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and J' e-P'\C{Us,^s) - C(n„^^J|rfs < 2cT. Therefore, the dominated convergence theorem 
imphes that 

hm E^-'^ [ / e-^^|C(n„e.) - C{lis,U)\ds] = 0. 

n— >oo / 

As a resuh, for any e > and n large enough, we find 

\J^{T,Tf,a) - /(T,7f,a)| < e. 

Now, since ^ G lAniT) we have \4(T, vf, a) = UniT,Tf,a) > J^{T,n,a) > J^(T, vr, a) —e for 
sufficiently large n, and it follows that 

(3.12) V{T, TT, a) = lim vf, a) > J«(T, vf, a) - e. 

Since ^ and e are arbitrary, we have the desired result. □ 

Proposition 3.3. The value function U is the smallest solution of the dynamic programming 
equation QU = U , such that U >Uq. Thus, 



(3.13) ?7(T,7f,a) = sup E^-" 

Te5{T) 



e-''"C(li„ a) ds + e~^^MU{T - r, 11,, a) 



Proof. Step 1. First we will show that ?7 is a fixed point of Q. Since Vn < U, monotonicity of Q 
implies that 



Vn+i{T,n,a) < sup E^'" 
Te5(T) 



e-^"C(II„ a)ds + e-P^MU{T - r, II,, a) 



Taking the limit of the left-hand-side with respect to n and using Lemma 3.1 and Proposition 3.2 
we have 



U{T, 7f , a) < sup E'^''^ 
Te-S(T) 



e-'''C(n„ a) ds + e'^^MUiT - r, n„ a) 



Let us obtain the reverse inequality. Let t E S (T) be an e-optimal stopping time for the problem 
in the definition of QU, i.e.. 



(3.14) 



e-'^'CiUs, a) ds + e'^^MUiT - f, Uf, a) 



> sup E'^''^ 

T&S{T) 



e-^'C(n„ a) ds + e-f^MU{T - r, n„ a) 



e. 



Then, as a result of monotone convergence theorem and Proposition 3.2 
(3.15) 

U{T, 7f, a) = lim V;(r, vf, a) > lim E"'"^ / e~f"Cilis, a) ds + e-f^MVn-iiT - f, Uf, a) 

n^oo n^oo 



e-'''C(n„ a) ds + e'f^MUiT - f, Uf, a) 
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Now, (3.14) and (3.15) together yield the desired result since e is arbitrary. 

Step 2. Let U be another fixed point of Q satisfying U >Uq = Vo. Then an induction argument 
shows that U > U: assume that U > K„. Then QU > QVn = Ki+i, by the monotonicity of Q. 
Therefore for all n, U > Vn, which implies that U > sup„ Vn = U. 

□ 

To illustrate the nature of (3.13) consider the special case where A = {1,2} so that only two 
types of policies are available. In that case the intervention operator Ai is trivial, AiU{t, vf, a) = 
U{t,7i,3 — a) — K{a,3 — a,7i). For ease of notation we write f/(t,7f, 1) =: V(t, vf), U(t,7i,2) =: 
W(t,7i). It follows that (3.13) can be written as two coupled optimal stopping problems: 



V{T, 7f) = sup E"'" 
Te5(T) 

W{T, 7?) = sup E"'" 

re5{T) 



e~'''C(n„ 1) ds + e-f^iWiT - r, 11,) - K{1, 2, vf)) 



e-^^C(n„ 2) ds + e-^"(V(T - r, 11,) - K{2, 1, vr)) 



The next section discusses how to solve such coupled systems. 

Remark 3.1. The value function U{T, -,0) is uniformly bounded. Indeed, 



U(T, vf, a) > Uo(T, vf, a) 
and conversely for any ^ EU (T) , 



E^ 



e-P'C{lis,a) ds 



> 



e '"*crfs, 



J^{T,n,a) < E^'" 



e-^^c(n„e.)rfs 



< / e-P'cds. 
'0 



Since 



re-^^cds<\ ^henp = 0; 

Jo y c/p when p > 0, 

we see that when p > those bounds are even uniform in T. 

Remark 3.2. One may extend the above analysis to cover the slightly more general case where 
K{a, b, vf) are allowed to be negative, as long as we assume that for any chain ao, ai, . . . , a„, G ^4 
we have 

K{ao, ai, vf) + K{ai, 02, vf) + . . . + -ft'(a„, oq, n) > ko > 0, 

uniformly. This condition implies that repeated switching is unprofitable and guarantees that the 
number of switches along any path is finite with probability one. Then taking A' = {0, Ai, . . . , Aa_} 
and for any i ^ A, K{0,Ai,7i) = — XljeB -^(^?i)^j5 -^(^^,0,7?) = +00, C(7r, Aj) = 0, one may 
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imbed the optimal stopping problems studied in Bayraktar and Sezer [2006] and Ludkovski and 
Sezer [2007] in our framework. Namely, it is easy to see that in this case 



(3.16) [/(T,7f,0)= sup 



e-P'C{Us, 0) ds + e-P^H{^i, Mr) 







In that sense, our model is a direct extension of optimal stopping problems for hidden Markov 
models with Poissonian observations. 

Remark 3.3. Using the dynamic programming principle developed in Proposition 3.3 one expects 
that the value function U is the unique weak solution of a coupled system of QVIs (quasi- variational 
inequalities) 

(3.17) 

d 

-QfU{T, 7f, a) + AU{T, 7f, a) - pU{T, vf, a) + C{n, a) < 

f/(T,7f,a) > MU{T,fr,a) 

d 



Qr^U{T, 7f, a) + AU{T, vf, a) - pU{T, yf, a) + C{n, a) j {U{{T, vr, a)) - MU{T, vf, a)) = 0. 

— * 

Here A is the infinitesimal generator of the process 11 given by (2.9). ^ is a first order integro- 
differential operator. Note that the differential operators do not differentiate with respect to a, 
therefore for each a we obtain a different QVI. These QVIs are coupled by the action of the 
intervention operator Ai. 

One could attempt to numerically solve the above system of QVIs. However, the theoretical 
basis for the QVI formulation requires justification, in particular in terms of the regularity of the 
value function U. Typically one must pass to the realm of viscosity solutions to make progress; 
in contrast in the next section we will develop another a more direct characterization of the 
value function (see Proposition 3.4). In Section 4 we will use this characterization to develop 
the regularity properties of U, which helps us describe an optimal control. The more direct 
characterization of the value function in Proposition 3.4 also provides us a numerical method for 
numerically solving for the value function. 

3.2. First Jump Operator. The following Proposition 3.4 shows that the value function U 
satisfies a second dynamic programming principle, namely U is the fixed point of the first jump 
operator L. This representation will be used in our numerical computations in Section 6. Let us 
introduce a functional operator L whose action on test functions V and H is given by 



(3.18) L(V,i7)(T,7r,a) = sup E^'" 

te[o,T] 



Mcti 

e-'''C(n„a) ds 



+ lst<^,xe-P'H{T - t, Ut, a) + e-^^Ua>, . l^(T - a,, H,,, a) 



18 



ERHAN BAYRAKTAR AND MICHAEL LUDKOVSKI 



Observe that L is clearly monotone in both of its function arguments. Moreover, we have 



(3.19) 



L{V,H){T,7T,a) = sup E"'"^ 
TeS{T) 



[ 'e-^^C(n„ a) ds + l{r<a,}e-'"H{T - r, n,, 
Jo 



which follows as a result of the characterization of the stopping times of piecewise deterministic 
Markov processes (Theorem T.33 Bremaud [1981], and Theorem A2.3 Davis [1993]) which state 
that for any r G S{T), r A o"i = t A o"i for some constant t. 
Let us introduce another monotone functional operator by 

LV = L{V,MV). 

Proposition 3.4. U is the smallest fixed point of L that is larger than Uq. Moreover, the following 
sequence which is constructed by iterating L, 



(3.20) 

satisfies Wn y U (pointwise). 



Wo ^ f/o, iv„+i = nen, 



Proof. Step 1. Recall that L is a monotone operator and that 



Wi{T,7i,a) = L{Uo,MUo){T,n,a) > E^'" 



TAo-i 



e-^^C(n„ a) ds + e-'"^U|r>.i}f/o(T - ai, U„„a 



= Uo{T, vr, a) = Wo{T, vr, a). 

Therefore (W^n)neN is an increasing sequence of functions. Denote the pointwise limit of this 
sequence hj W = sup„ Wn- This limit is a fixed point of L: 

W{T, 71, a) = sup Wn(T, 71, a) 



neN 



sup sup E'^'" 

neN te[0,T] 



(3.21) 



/ e-P'CiUs, a)ds + l^t<a,}e-''MWn.^{T - t, a) 
.Jo 

+ e-'"^U|i>,,}iy„_i(r - (r^, U^„a) 



sup supE'^'" 

te[0,T] neN 



/ e-^'CiU,, a)ds + l^t<a,}^-''MWn.^{T - t, 11^, a) 
.Jo 



+ e-'"^U|i>,,|l^„_i(r-ai,n„ 



LW{T, 71, a) 



where the last line follows from the monotone convergence theorem. In fact it is the smallest of 
the fixed points of L that is greater than Uq = Wq, which is a result of the following induction 
argument: suppose that W > Uq is another such fixed point. Then W = LW > LUq = Wi. On 
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the other hand, if W > Wn, then W = LW > LWn = Wn+i- Now taking the supremum of both 
sides we have that W >W. 

Step 2. We will now show that is a fixed point of Q, hence W > U as a result of Proposi- 
tion 3.3. First, we will show that W > QW . Let us construct an increasing sequence of functions 
by Mo = A4W, Un+i = L{un, J^W), n eN. It can be shown that m„ can be written as 

r l-TAUn 

(3.22) u„(T, 7f, a) = sup E^'" / e-'''C(n„ a) + e-''"^""A<iy(T - (r A a„), n,Aa„, a) 

see e.g. Proposition 5.5 in Bayraktar et al. [2006]. Taking n — > oo we find that the monotone limit 
u = lim„-foo Un satisfies u = QW. Now, we can show that W > QW using induction. From step 
1, we know that W = L{W,AiW), therefore W > M.W = Uq (since stopping immediately may 
not be optimal in (3.19)). On the other hand, if W > Un, then since L{-,AiW) is a monotone 
operator, we have that W = L{W, MW) > L{un, MW) = Un+i- This implies that W > Un for all 
n e N. Therefore, W > QW = sup„ Un- 

Let us show the reverse inequality: W < QW. As a result of the monotone convergence 
theorem we have that QW = sup„gpj^VF„. Clearly QWn > LWn since Wn > A4Wn-i, and the set 
of stopping times that we are taking a sup over is smaller than S(T). Therefore, QWn > Wn+i. 
Since we can repeat this argument for all n, 

QW = sup QWn > sup Wn+i = W. 

neN nGN 

Step 3. We will now show that W < U (which together with the result of step 2, shows that 
W = U). On the one hand, using the strong Markov property of (H, ,^), the value function U 
can be shown to be a fixed point of L (see Proposition 5.6 in Bayraktar et al. [2006]): recall that 
U = QU (the right-hand-side of which is an optimal stopping problem) and compare with (3.19). 
On the other hand, from step 1 we know that W is the smallest fixed point of L greater than Uq. 
But this implies that U >W. □ 

Remark 3.4. As a result of Fubini's theorem and using (2.13) and (2.15) we can write L as 

(3.23) LV{T,Ti,a)= sup I ( V mi(t, tt) ) ■ e-^*A<l^ (T - t, f(t, vf), a) 

o<t<TlV^ / 



+ 



/ ^ ^" ^iju, n) ■ [C{x{u, tt), a) + Xj ■ SiV{T — u, x{u, tt), a))du \, 

in terms of the operator 
(3.24) 

n u ^ \ ^ f i+ i ^ifliy)'^! Xrafm{y)T^m \ \ f( \ \ r ■ ^ t? 

Siw{t,TT,a)= w\t,\— -——,...,= -——] ,a ] fi{y)u{dy), for t e E. 
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This implies that one can numerically compute LV by performing the deterministic optimization 
on the right-hand-side of (3.23). 



4. Regularity of the Value Function and an Optimal Strategy 



In this section we will analyze the regularity of the value function U, which will lead to the 
construction of an optimal strategy. This is done by analysis of two auxiliary sequences of functions 
converging to U. We first begin by studying Uq. 

Lemma 4.1. The function Uq defined in (2.9) is convex in n. 

Proof. Let us define a functional operator I through its action on a test function w by Iw = L{w, 0), 
that is, 



/ e-''*C(IIt, a)dt + l{„,<T}e-'^'wiT - a^, U^,, 
Jo 



Iw{T, vf, a) = E^'" 

(4.1) T 

= / e~^" rni{u, n) ■ [C{x{u, n),a) + Xj ■ SjwjT — u, x{u, vf), a)] du. 

Jo .V- IP 



As a result of the strong Markov property of 11 we observe that Uq is a fixed point of /, and if we 
define 



(4.2) 



kn+i(T, vf, a) = Ikn(T, vf, a), k^iT, vf, a) = 0, T G IR+, ti & D,a & A 



then kn Uq, see Proposition 1 in Costa and Davis [1989]. We will divide the rest of the proof 
into two parts. In the first part we will show that kn converges to Uq uniformly. In the second 
part we will argue that for all n G N, fc„ is convex. Suppose both of the above claims have been 
proved and let e > 0. Then for any 711,712 E D 



(4.3) 



Uo{T, avfi + (1 — «)7ri, a) = Uo{T, ani + (1 — a)vri, a) — A;„(T, ani + (1 — a)TT2, a) 

+ kn{T, avfi + (1 - a)vr2, a) 
< e + akn(T, tti, a) + (1 — a)kniT, 7r2, a) 
<2e + aUo{T, tti, a) + (1 - a)Uo{T, 7^2, a), 



in which the last two inequalities follow since for n > N{e) large enough, \Uo{T, tt, a)—kn{T, vf, a)| < 
e for all vf G -D. Since e was arbitrary the convexity of vf — > Uo{T,7r, a) follows. 
Step 1. Using strong Markov property we can write /c„ as (cf. (3.22)) 



(4.4) 



kniT, TT, a) 



<T„AT 



e-P'C(lit, a) dt 



Uo 



As a result, 
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(4.5) 



\Uo{T,fr,a)-K{T,TT,a)\ < 



e-P'\C{Ut,a)\dt 



{T><T„} 
rT—C7n 

{T>.„}e-^'^"c / 
Jo 

< cT¥^{T > an} 

< cTE'^ [l|T>.„}(T/o-„)] < cT^ ■ E^ . 



The conditional probability of the first jump satisfies P'^jcri > t\M} = e Therefore, 

E* [e-"'"i|M] = E"" 







/ we""* , dt 


M 







(4.6) 



/■oo 

/ [1 - e-^(*)] ue""* 
Jo 



< 



1 - e 



-At 



A 



X + u 



where A = maxjg^; Aj, see (2.11). Since the observed process X has independent increments given 
M, it readily follows that E'^ [e~"'^"|M] < A"/(A + -u)", which immediately implies that 



A 



X + u 



Also, since l/o"„ = e ^"^du, an application of Fubini's theorem together with the last inequality 
yield 



(4.7) 



E'' 



1 

0"n 



< 



A 



X + u 



du 



X 



n — 1 ■ 



n>2. 



The uniform convergence of kn to Uq now follows from (4.5) and (4.6). 

Step 2. Here, we will show that (fc„)„>o is a sequence of convex functions. This result would 
follow from an induction argument once we show that the operator / maps a convex function to 
a convex function. 

Let us assume that tt w{T, if, a) is a convex function for all T > 0. Therefore, we can write this 

convex mapping as tt ^ w(T — u, n , a) = sup^g^^ akfi(T — u) + akA(T — u)'7ii H \-ak,m(T — u)'7im, 

for some constants a^j ^ and countable sets K^. Then using Xj(t, vf) = mj()f:, vr)/ ^^.^^ mj(t, vf) 
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and the second equality in (4.1) we obtain 
(4.8) 

Iw(T,7r,a) = / e~^'y^Cimi{u,7r) du + / e"''" Ajmj(M, vf)- 
Jo :^T^ Jo 



sup I ak 



,0 T - M + > akj T - u) ' ' ' — — , 



fi{y)v{dy) 



j&E 



du 



/ sup ^[akj{T -u) + akfl{T - u)]Xjfj{y)mj{u, if) | u{dy) 



du. 



Since vf mlu, fc) is linear in vf (see (2.15)) and the supremum of linear functions is convex, the 
convexity of tt ^ Iui{T, yf, a) follows. 

□ 

Lemma 4.2. f/o(T, vr, a) is continuous as a function of its first two variables. 

Proof. The proof will be carried out in two parts. In the first part we will show that vf 
Uo{T,7f,a), is Lipschitz on D. In the second part we will show that T Uo{T,7f,a) is Lipschitz 
uniformly in vf. But these two imply that (T, vf) Uo{T, vf, a) is continuous for all a G ^ since 



(4.9) 



|f/o(T,7f,a) - Uo{S,p,a)\ = \Uo{T,n,a) - Uo{T,p,a) + Uo{T,p,a) - Uo{S,p,a)\ 

< R{T,a)\n - p\ + R{a)\T - S\, n,peD; T, 5 G M+, 



in which R{T, a) and R{a) are the Lipschitz constants above. 

Step 1. The idea is to use the convexity of Uq. Unfortunately, the convexity of vf ^ Uq{T, tt, a) 
implies that this function is Lipschitz only in the interior of D. In what follows we will show that 
vf — >■ Uo{T, vf, a) is the restriction of a convex function vr UqItt) whose domain is strictly larger 
than D, which implies the Lipschitz continuity of vf — > t/o(T, vf, a) also on the boundary of the 
region D. To this end let us define the functional operator / through its action on a test function 
w as 

rT 

Iw{T,p,a) = / e~^" mi{u, n) ■ [C{x{u,p), a) + Xj ■ Siw{T — u, x{u,p), a)] du, 
for p G -D, T G IR+, a G .4 in which 

D=\peW^: J2p,<2}. 

ieE 
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Note that I is nothing but an extension of the operator / we defined in the proof of Lemma 4.1. 
Let us define 

kn+i{T,p,a) = Ikn{T,p,a), ko{T,p,a) = 0, T e R+,p e D,a e A. 

Using the very same arguments as in the proof of Lemma 4.1, we can show that p kn{T,p, a) 
is convex for all n, and this sequence of functions uniformly converges to a convex limit p — >■ 
Uo{T,p,a). Clearly, kn(T,p,a) = kn(T,p,a) when p E D. As a result Uo(T,p,a) = Uo(T,p,a) on 
D. Since lfo{T,p,a) is locally Lipschitz in the interior of D (as a result of its convexity), we see 
that p Uo(T,p, a) is Lipschitz on the compact domain D. 

Step 2. The Lipschitz property of Uq with respect to time (uniformly in vf) follows from 



(4.10) \Uo{T,7i,a) -Uo{S,n,a)\ < E"'" 



/ e-f"\C{Ut,a)\dt 
Js 



<c\T-S\. 

□ 



Lemma 4.3. For all a E A, T E M+, (M/„(T, vr, ■))„gN, defined in (3.20), form a sequence of 
convex functions. Moreover, for each a E A and n G N, the function (T,7r) — > Wn(T,7i,a) is 
continuous. 

Proof. The proof of the convexity of vf — Wn{T, vf, a) is similar to the proof of convexity of 
7? — > kn{T, TT, a), which is defined in the proof of Lemma 4.1, see Part II of that proof. 

The continuity proof on the other hand parallels the continuity proof for (T, tt) Uo{T,7T,a) 
which we carried out above. The proof of the uniform Lipschitz continuity of Wn with respect to 
time is similar to the corresponding proof for U in Lemma 4.5 below. □ 

Remark 4.1. The value function U is convex in vr, since as a function of vf, U is the upper envelope 
of convex functions (Wn). 

Lemma 4.4. The value function U is Lipschitz continuous in n, 

(4.11) I [/(T, TTi, a) -f/(T,7f2, a) I < i?(T, a) |7fi -7f2 1, ni,n2 E D; T < Tq; a e A, 
where the positive constant R depends on T and a. 

Proof. The proof parallels Step 1 of the proof of Lemma 4.2. Again a convex sequence of functions 
is constructed, converging upwards to an extension of U on D (each element in this sequence is 
an extension of Wn onto the larger domain.). Here, the convergence is not uniform but monotone. 
The result still follows since the upper envelope of convex functions is convex, so that the limit is 
convex and therefore Lipschitz in vf on the original domain D. □ 

Lemma 4.5. The value function U is continuous in T uniformly in the other variables, namely 

(4.12) \U{T,7r,a) -U{S,n,a)\ <c\T - S\, for any n E D; T, S E iO,To]; a E A. 
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Proof. Fix S > T. Let be e-optimal strategies for U{S,7i,a) and U(T,7i,a) respectively. 

Then, taking = ^-^1[o,t] + ^t^{t,s] we have 



U{S, vr, a) - U{T, vf, a) > J« {S, vf, a) - ( J« (T, vf, a) + e) 



E 



7r,a 



e^P'C{Us,^^) ds 



e > 



(5-r)c 



e. 



On the other hand, using the strong Markov property of (H, ^'^), 
U{S, vf , a) - U{T, vr, a) < J^' {S, vr, a) + e - J^'-'^1'>,t] (^-^ 



+ e 



< E" 



+ e 



< e-P^{S -T)c + e, 

Since e was arbitrary, we therefore conclude that \U{T, if, a) — U{S, vr, a) | < c|T — 5*1 as desired. □ 

Lemma 4.6. For each a & A and n, the function (T, vf) Vn{T, vf, a) is continuous. 

Proof. We proved in Lemma 4.2 that (T, vf ) — Uo{T,7f,a) is continuous. Furthermore, observe 
that the operator Ai preserves continuity: if for all a & A, (T, vf) —>■ V{T, vf, a) is continuous then 
for (Ti,vfi) and (T2,vf2) close enough 



(4.13) 



|A^V^(Ti,vfi,a)-7U(T2,vf2,a)| < max |l^(Ti, vfi, 6) - \/(T2, vfs, 6)| 



is small. 

The rest of the proof follows due to the properties of the operator Q in (3.3). Indeed, Qw{-, ■, a) 
defines an optimal stopping problem for LI with terminal reward function A1w(-, - , a). As shown in 
Corollary 3.1 of Ludkovski and Sezer [2007] (see also Remark 3.4 in Bayraktar and Sezer [2006]), 
when Aiw is continuous, then the value function Qw of this optimal stopping problem is also 
continuous. Therefore, by induction, Vn+i = QVn is continuous. 

□ 

Corollary 4.1. The value function U{-, ■, a) is continuous for all a & A. Moreover, {Vn{-, ■, a))nm 
defined in (3.4) and iWn{-, ■, a))n>o, defined in Proposition 3.4, both converge to U{-, ■, a) uniformly 
for all a & A. 

Proof. Lemmas 4.4 and 4.5 imply the continuity of U{-, -,0) (also see (4.9)). Now the rest of the 
statement of the corollary follows from Dini's theorem, which states that pointwise convergence 
of continuous functions to a continuous limit implies uniform convergence on compacts. □ 
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Using Corollary 4.1 we obtain the following explicit existence result about an optimal strategy 
for U: 

Proposition 4.1. Let us extend the value functions Uq and U so that 

(4.14) Uo{T, vf, a) = U{T, tt, a) = 0, T G [-e, 0), n e D, a e A, 

for some strictly positive constant e. Let us recursively define a strategy ^* = {^o,tq; ^i,ti, . . .) via 
^0 = To = and 

n+i = M{s G [rfe,T]: U{T - s,li{s),^k) = MU{T - s,Il{s),^k)}; 

6+1 = d_Mu{T - Tfc+i, n(rfc+i), 6), = 0, 1, . . . , 

with the convention that inf = T + e. Then ^ is an optimal strategy for (2.6), i.e., 

/V^^c(n(s),c)c?s- J2 e-''^'=ir(a,a+i,n(r.)) 

Jo , -rn 



(4.15) 



(4.16) [/(T, 7f, a) = E"''* 
Proof. We will show that for n = 1,2, . . . 



k:Tk<T 



(4.17) E""'" 



n-1 



e-^^c(n(s),e.)rfs-J]e-''^'=K(a,a+i,n(r-fe)) 

= U{T, TT, a) - E-^''^ e-^^"f/(T - r„, n(r„), 6) 



fe=0 



Suppose that (4.17) is true. Then 



E^ 



(4.18) 



re-P'C{li{s),is) ds-J2 ^~'^'K{ik, a+i, n(r.)) 
.•^0 fe=0 



U{T, 7f, a) - E^''^ e~P^-U{T - t^, n(x„), e„) 



+ E" 



e~P-"f/o(T-r„,n(r„),e„) 



Taking the limit as n ^ oo and using bounded convergence theorem and r„ — T + £, we have 
that 



U{T,TT,a) = E"'" 



< E" 



r e-''^Cili{s),^s)ds- J2 e-^^'=i^(a,a+i,n(rfc)) 

•^0 1 



k:T^:<T 



since K{a,b,7i) > 0, and equation (4.16) follows. 

To establish (4.17) we proceed by induction. The functions [/(■,■, a) and J^U{-,-,a) are con- 
tinuous by Corollary 4.1. As a result the stopping time 

(4.19) 



n = inf <^ s G [0, T] : U{T - s, U{s), a) = MU{T - s, Ii{s), a] 
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satisfies 
(4.20) 



e-P'C{liis), a)ds + e-P^'MUiT - ri, n(ri), a) 



U (T, vf , a) 



see e.g. Proposition 5.12 in Bayraktar et al. [2006]. Rearranging and using = dMu(T 

ri,n(ri),a), 

(4.21) 



7v,a 



e-'"^C(n(s), eo) ds - e-P-^K{i,, ^i, n(ri)) 



f/(T, 7f, a)-E-> e-''^^f/(T - n(ri), 6) 



proving (4.17) for n = 1. Perliaps we sliould empliasize tlie dependence on T on tlie left-liand- 
side of (4.21) by inserting T as another superscript above E (we are conditioning on the strong 
Markov process t — > (T — t^Iiti^t))- Although we are not going to implement this for notational 
consistency /convenience, one should keep this point in mind when reading the rest of the proof. 

Assume now that for some n > 1 (4.17) is satisfied; we will prove that it also holds when we 
replace n by + 1. Since r„'s are all hitting times we have that r„_|_i = t„ + ri o 9^-^. 

(4.22) 



e-'''C{U{s),Qds - J2 ^~'^'mk, a+i, n(rfc)) 



A;=0 



e~P'C{fl{s),a)ds 



n-l 



J2 e-'^'Ki^k, n(rfe)) + e-''^"En(^")'«" 



fc=0 



Using (4.21) we can then write 



E^ 



(4.23) 



E" 



e-P-"f/(r-r„,n(r„),e„ 



-pT„+l 



U(T - r„+i,n(r„+i),^„+i; 



Using (4.22) and (4.23) together with the induction hypothesis, we obtain (4.17) when n is replaced 
by n + 1. 

□ 



Let 
(4.24) 



Cs{a) ^{n eD: U{s, vf, a) > MU{s, vr, a)} , 

r,(a) ^{n eD : U{s, vf, a) = MU{s, vr, a)} 

denote the continuation and switching regions for initial policy a with s time units until maturity. 
The switching region can further be decomposed as the union Ub£A^s{,ci,b) of the regions defined 
as 



(4.25) 



r,(a, b) = {n eD : U{s, n, a) = U{s, vr, b) - K{a, 6, vr)} , b e A, 
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The results in the previous section imply that to solve (3.13) with initial horizon of T, one 
maintains the initial policy a and observes the process 11 until time ri = ti(T), whence it enters 
the region Tx-riia)- At this time, if 11,-^ is in the set r7-_^^(a,6) we take i^i = 6; that is, we select 
the Vth. policy in the policy set A. The boundaries of Vs{a,h) are termed switching boundaries 
and provide an efficient way of summarizing the optimal strategy of the controller. We plot these 
curves in our examples in Section 6. 

5. Extensions 

5.1. Infinite Horizon Formulation. In many practical settings, the controller does not have a 
natural horizon for her strategies. In such cases it is more appropriate to consider infinite-horizon 
setting. Due to time-homogeneity, the infinite-horizon problem is stationary in time, reducing the 
dimension by one. In particular, the optimal strategy can be simplified with a single switching- 
boundary plot, as rs(a)'s are independent of s. 
For p > 0, let 



(5.1) 



Vp{^,a) = sup E^'° 



CeW{oo) 



POO 

/ e"''*C(n(t), 6) dt-Y" e-P^^m^^,, ^fc, n(rfe)) 
Jo 



< oo. 



Here U{oo) denotes the admissible strategies that satisfy E'"'" |^^^ e~^'^'=i^'(^fc_i, ^fc, n(rfc)) 

The next proposition shows that the infinite horizon problem can be uniformly approximated 
by the finite horizon problems. In fact, the convergence is exponentially fast in the time horizon 
T. 

Proposition 5.1. There exists a constant R such that 

(5.2) |f/(r,7f,a) - 1/p(7f,a)| < e-^^i?. 

Proof. Let be an e-optimal strategy of t/(T, vr, a) and = ^"^(t)l[o,T] +Ct1(t,oo) gW(oo). Then 

" POO 

V^p(7r, a) - f/(r, 7f, a) > E"-'^ J e-^"C(n„ |J) -e 

POO 

> -e-f'^ / e-P'c ds-e> -e'^^c/p - e. 
Jo 

On the other hand, using an e-optimal control of V^(7r, a), 

POO 

Vp{n, a) - U{T, n, a) < E"'" 



< E' 



7r,a 



e-^^c(n.,C)c?.- J2 e-^^'=K(er_i,er,n.r) 

k:Tk>T 

r"00 



+ e 







+ e 
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for some constant R where the last hne used the fact that the inner term, which is the infinite- 
horizon counterpart of Uq, is uniformly bounded on the compact domain D x A. Taking R = 
max(_R, c/p) the proposition follows. □ 

The characterization of the value function of the infinite horizon problem, which we give below, 
follows along same lines as in Section 4. 

Proposition 5.2. Vp is the smallest fixed point of the operator Lp(V) = Lp{V,A4V) where 



Lp{V,H){7i,a) = supE"'° 

i>0 

and 



/ e-''^C(II„ a) ds + l{t<.,}e-''H{nt, a) + l|4>,,|e-^"^l^(II,,, a) 

^0 



MV{T{,a)= max {V{Tx,h) - K{a,h,Ti)} 



Note that Lp is given by 



(5.3) I/pw(7f, a) = sup< (y^mj(t,7f)) -e ^* ■ vf), a) 



-pu 



E 



mi[u, Tc 



C{x{u, vf), a) + XiSiw{x{u, vr), a) 



du 



where 



Siw{7T, a) 



w 



Ai/i(l/)7ri 



Am/m(2/)7r„ 



• a] fiiy)^idy), i = l,...,m, 



for a bounded function w{-,-) defined on D x ^ only. The optimal stopping time for Vp is now 
the first entrance time ro(vf) of the process 11 to the time-stationary region 



r(a) = jyT G D : Vp{n,a) = MVp{n,a)^ 



To compute Vp we define again 



Wo{n,a) = 



e-^"C(II„a) ds 



'-Jo 



and Wn+i = LpWn. 



Then as in Section 4, it can be shown that Wn Vp, and Wn can be computed numerically by 
using (5.3). 
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5.2. Costs Incurred at Arrival Times. In many practical settings the arrivals of X are them- 
selves costly which leads us to consider a running cost structure of the form 

N{t) 

i=i 

where q : M'^ x ^ i-^ M (with f^^ c^{y, a)j-'i{dy) < oo for alH G -E, a G A) is the cost incurred upon 
an arrival of size Yj when the controller has policy a in place and the environment is M^j. = i. 
Above N[t) is the number of arrivals by time t, and {aj,Yj) are the arrival times and marks 
respectively. As an example, see Section 6.3 below. 

In the latter case, setting C{y, vf, a) = J2i a) one deals with the objective function 



(5.4) U{T, vf, a) = sup E"'" 

?eW(T) 



N{T) 

J2 e~''"^c(r„n(or,),e.,) - 5^e-^^^K(a-i,a,n(rfc)) 



by solving the equivalent coupled stopping problem 

'n{t) 



U{T, vf, a) = sup E"-'* 
Te-s(T) 



e'P^'^CiYj, n(aj), a) + e'^^MU (t - r, n(r) 



as in Proposition 2.6. One can easily verify that the function U is the smallest fixed point greater 
than Uq of the operator L whose action on a test function w is 

Lw{T, vr, a) = sup < ( > mi{t, vr) j ■ e~^* ■ Aiw (T — t, x{t, vr), a) 
o<t<T I / 

+ / e~^^2.^ii'^^^)''^i{ / C{y,x{u,n),a)ui{dy) + Siw{T — u,x{u,Tf),a) \ du>. 
Jo \Jr'' J ) 



6. Numerical Illustrations 

Below we provide numerical examples illustrating our model based on the applications outlined 
in Section 1.1. The numerical implementation proceeds by discretizing the time horizon [0,T] and 
then directly finding the deterministic supremum over t's in (3.23). Similarly, the domain D is also 
discretized and hnear interpolation is used for evaluating the jump operator S of (3.24). Because 
the algorithm proceeds forward in time with t = 0, At, . . . , T, for a given time-step t = mAt, the 
right-hand-side in (3.23) is known and one may obtain U{mAt,7f,a) directly. 

On infinite horizon since there is no time- variable the dynamic programming equation (5.3) 
is coupled. Accordingly, one must use the iterative sequence of Wn, as detailed in Section 5.1. 
Namely, one first computes Wq = Uq by iterating (4.2), and then applies Lp several times to find 
a suitably good approximation Wn- 
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6.1. Optimal Tracking of 'On-Off ' System. Consider a physical system (for example a mil- 
itary radar) that can be in two states E = {1,2}. Information about the system is obtained via 
a point process X that summarizes observations. The controller wishes to track the state of the 
system by announcing at each time < t < T whether the current state is a = 1 or a = 2, 
A = {1,2}. The controller faces a penalty if her announcement is incorrect; namely a running 
benefit is assessed at rate ci{l) dt (respectively, C2{2) dt) if the controller declares = 1 
deed Mj = 1 (resp. = 2 and Mf = 2). If the controller is incorrect then no benefit is received. 
Moreover, the controller faces fixed costs K{1, 2) (resp. K{2, 1)) from switching her announcement 
from state 1 to state 2. K{a,b)^s represent the effort for disseminating new information, alerting 
other systems, triggering event protocols, etc. A case in point is the alert announcements by the 
Department of Homeland Security regarding terrorist threat level which receive major coverage 
in the media and have significant nationwide implications with high associated costs. Thus, both 
in the case of an upgrade and in the case of a downgrade, specific protocols must be followed by 
appropriate government and corporate departments. These effects imply that alert levels should 
be changed only when significant changes occur in the controller beliefs. 

To illustrate we take without loss of generality Ci(l) = C2(2) = l,ci(2) = C2(l) = and first 
consider K{1, 2) = K{2, 1) = 0.05, p = 0, T = 1. We assume that X is a simple Poisson process 
with corresponding intensities A(M) = [1,4], so that arrivals are much more likely in the 'alarm' 
state 2. Finally, the generator of M is 



so that on average an alarm should be declared limt^oo IP{Aft = 2} = 25% of the time. 

Figure 1 shows the results, in particular the switching regions Ts{a, b). We observe a highly non- 
trivial dependence of the switching boundaries on time to maturity. First, very close to maturity, 
no switching takes place at all, as the fixed switching costs K dominate any possible gain to be 
made. For small s, the no-switching region Cs{a) is very large, because the controller is reluctant 
to change her announcement close to maturity. On the other hand, we observe that the switching 
region in policy 1 narrows between medium s ~ 0.2 and large s. This happens again due to the 
finite horizon. With s = 0.2, when the controller believes that Mf = 2 with high probability, it 
is unlikely that Mt will change again before maturity, so that the optimal strategy is to pay the 
switching cost K{1, 2) and plan to maintain policy 2 until expiration. On the other hand, for large 
s > 0.5, even when F{Mt = 2} = 1 — tti is quite large, the controller knows that soon enough 
M is likely to return to state 1 (since g2,i is large); rather than do two switches and track M, 
the controller takes a shortcut and continues to maintain policy 1 (with the knowledge that her 
error is likely to be shortlived). This "shortcircuiting" will disappear only when tti is extremely 
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small. Note that this phenomenon is one-sided: because gi,2 is small, the upper boundary Ts{2, 1) 
is monotonically decreasing over time. 




Figure 1. Sequential tracking of a two-state Markov chain. The left panel shows 
the value functions U{T, vf, ■), a G {1, 2}, as a function of tti for T = 1. Recall that in 
this case D = {(vri, 1 — tti) : < vti < 1}. The vertical lines indicate the boundary of 
r7-(l, 2) and Tx{2, 1). The right panel shows the switching regions Ts{a, b) (namely 
rs(l, 2) is below the lower curve and Ts{2,l) is above the higher curve) as a function 
of time to maturity s. 



6.2. Policy Making Example. The Federal Reserve Board (the Fed) has the task of adjusting 
the US monetary policy in response to economic events. The Fed has authority over the overnight 
interest rates and attempts to implement a loose monetary policy when the economy is weak, and 
a tight monetary policy when the economy is overheating. Unfortunately, the current state of the 
economy M is never precisely known; thus the main task of the Fed is to estimate M from various 
economic information it collects. When the beliefs of the Fed change sufficiently, it will adjust 
its monetary pohcy ^. Such adjustments are expensive, since they are closely followed by market 
participants and send out important signals to economic agents. Thus, beyond trying to track M, 
the Fed also seeks stability in its policies, in order not to disrupt planning activities of businesses. 

As can be seen from this description, this problem fits well into our tracking paradigm of (1.3). 
For concreteness, let M = {Mt}t>o represent the current economy with state space E = {1, 2, 3} = 
{Overheating, Growth, Recession}. The generator of M is taken to be 

/-4 3 1 \ 
Q= 2-4 2 . 
^0 3 -3y 
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a = a = 1 a = 2 

Figure 2. Value function U{T,7f,a) of the Fed policy-making example in Section 
6.2 plotted together with the switching regions Tj-la, ■) for each current policy a. 



Thus, M moves randomly between all three states (and we assumed that a recession cannot be 
immediately followed by overheating). In the face of these three states, the Fed also has three 
policy levels, namely its action set is ^ = {0, 1,2} = {Tight, Normal, Accommodating} . 
The cost function C(7f, a) = ^jg^; Cj(a)7rj, is given by the matrix 



Ci{a) 



2 




-1 -1 



-1 -1 








i e E,ae A. 



The switching costs are given by K{a, h) = 0.05 ■ l{a^b} for a,b E A. The observation process X is 
a simple Poisson process with M-modulated intensity A = [Ai, A2, A3] = [1, 2, 5]. Thus, the worse 
the economy state, the more frequent are (negative) events observed by the Fed. 

Figure 2 illustrates the obtained results for T = 4 and no discounting. The triangular regions in 
Figure 2 are the state space D = {n E M.^ : novr + TTcro + '^B.ec = !}• The respective panels show 
how the initial switching regions Txia, ■) and value functions U{T, vf, a) depend on the current 
policy a. Observe that because the penalty for not tracking recessions is small, starting out in the 
'Normal' regime, the Fed will never immediately adopt an 'Accommodating' policy, r7-(l, 2) = 0. 
Similarly, because the penalty for missing an overheating economy is very large, the switching 
regions into a 'Tight' policy are large and conversely, the continuation region Cr(0) is large. Also, 
observe that the value function appears to be not differentiable at the boundaries. Finally, we 
stress that because of the final horizon, this problem is again non-time-stationary and the solution 
(as well as Tx^a, •)) depends on remaining time T. 
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6.3. Customer Call Center Example. Our last example illustrates the structure of the infinite 
horizon version together with a different cost structure. We consider a call center application that 
employs a variable number of servers to answer calls. The calling rate fiuctuates and is modulated 
by the unknown environment variable M. Having more servers decreases the per-call costs, but 
increases fixed costs related to payroll overhead. 

We assume that Mt E E = {Low, Med, High} with a generator 

-11 \ 
1-21. 
1 -ly 

The observed process X represents the actual received calls and is taken to be a compound Poisson 
process with intensity X{Mt) and marks Yi, I2, • • • that represent intrinsic call costs. Suppose that 
Y G {6, 12, 24}, and the distribution of Y and A is M-modulated: 

/1/4 1/2 l/4\ 
Ui^^ =F{Y = yj\Mt = t} = 1/3 1/3 1/3 ; A = [1 3 4]. 

\l/4 1/4 1/2/ 

Thus, as the manager receives calls, she dynamically updates her beliefs about current state of M 
based on the intervals between call times and observed call types. 

The call center manager can choose one of two strategies, namely she can employ either one or 
two agents, a E A = {1,2}. Employing a agents leads to per-call costs of ci{Y, a) = —Y/a and to 
continuously-assessed costs of C2{Y,a) = —(10 + 20a). Thus, when P{Mj = High} is sufficiently 
high, it is optimal to employ both agents, otherwise one is sufficient. Finally, switching costs for 
increasing or decreasing number of agents are set at K{a, h) = 2. Note that here all the costs are 
independent of M (and hence of 11). 

We consider an infinite horizon formulation and take p = 0.5. The parameter p measures the 
trade-off between minimizing immediate costs and having a long-term strategy that takes into 
account future changes in M. Thus p = 0.5 means that the horizon of the controller is on the 
time-scale of two time periods. The overall objective is: 



sup 



.i=i 



poo 

Jo 



Figure 3 shows the results, as well as a computed color-coded sample path of 11 which shows 
the implemented optimal strategy. The given path has four jumps and three policy changes (two 
changes occur between jumps when 11 enters r(l,2), and one change occurs at an arrival when 
n jumps back into r(2, 1)). Observe that in the absence of new information, 11 converges to the 
fixed point tToo = [0.7, 0.23, 0.07] (the invariant distribution of e*^"^), as can be seen from the fiow 
of the paths in Figure 3. 
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Figure 3. Tracking the regime of a customer call center. We show a sample path 
of n inside the simplex D = {(tti, tt2, 7T3) : tTj > 0, tti + 7r2 + vrs = 1}, as well as the 
corresponding optimal strategy. The initial state is Ho = (0,1,0) and ,^0 = 1- On this 
path we have t G [0, 4] and the arrival pairs (corresponding to jumps of X, recall 11- 
dynamics in (2.13)) (a^, F^) for £ = 1,2,3,4 are (0.51, 2), (0.66, 3), (1.44, 1), (2.23, 2), 
respectively. The resulting optimal strategy ^* is color-coded: dashed line for = 1, 
sohd line for =2. 
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